Pesquisa | Portal Regional da BVS

1.

Transcriptomic profiling of Rana [Lithobates] catesbeiana back skin during natural and thyroid hormone-induced metamorphosis under different temperature regimes with particular emphasis on innate immune system components.

Corrie, Lorissa M; Kuecks-Winger, Haley; Ebrahimikondori, Hossein; Birol, Inanc; Helbing, Caren C.

Comp Biochem Physiol Part D Genomics Proteomics ; 50: 101238, 2024 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-38714098

RESUMO

As amphibians undergo thyroid hormone (TH)-dependent metamorphosis from an aquatic tadpole to the terrestrial frog, their innate immune system must adapt to the new environment. Skin is a primary line of defense, yet this organ undergoes extensive remodelling during metamorphosis and how it responds to TH is poorly understood. Temperature modulation, which regulates metamorphic timing, is a unique way to uncover early TH-induced transcriptomic events. Metamorphosis of premetamorphic tadpoles is induced by exogenous TH administration at 24 °C but is paused at 5 °C. However, at 5 °C a "molecular memory" of TH exposure is retained that results in an accelerated metamorphosis upon shifting to 24 °C. We used RNA-sequencing to identify changes in Rana (Lithobates) catesbeiana back skin gene expression during natural and TH-induced metamorphosis. During natural metamorphosis, significant differential expression (DE) was observed in >6500 transcripts including classic TH-responsive transcripts (thrb and thibz), heat shock proteins, and innate immune system components: keratins, mucins, and antimicrobial peptides (AMPs). Premetamorphic tadpoles maintained at 5 °C showed 83 DE transcripts within 48 h after TH administration, including thibz which has previously been identified as a molecular memory component in other tissues. Over 3600 DE transcripts were detected in TH-treated tadpoles at 24 °C or when tadpoles held at 5 °C were shifted to 24 °C. Gene ontology (GO) terms related to transcription, RNA metabolic processes, and translation were enriched in both datasets and immune related GO terms were observed in the temperature-modulated experiment. Our findings have implications on survival as climate change affects amphibia worldwide.

2.

ntEmbd: Deep learning embedding for nucleotide sequences.

Hafezqorani, Saber; Nip, Ka Ming; Birol, Inanc.

bioRxiv ; 2024 May 02.

Artigo em Inglês | MEDLINE | ID: mdl-38746190

RESUMO

Enabled by the explosion of data and substantial increase in computational power, deep learning has transformed fields such as computer vision and natural language processing (NLP) and it has become a successful method to be applied to many transcriptomic analysis tasks. A core advantage of deep learning is its inherent capability to incorporate feature computation within the machine learning models. This results in a comprehensive and machine-readable representation of sequences, facilitating the downstream classification and clustering tasks. Compared to machine translation problems in NLP, feature embedding is particularly challenging for transcriptomic studies as the sequences are string of thousands of nucleotides in length, which make the long-term dependencies between features from different parts of the sequence even more difficult to capture. This highlights the need for nucleotide sequence embedding methods that are capable of learning input sequence features implicitly. Here we introduce ntEmbd, a deep learning embedding tool that captures dependencies between different features of the sequences and learns a latent representation for given nucleotide sequences. We further provide two sample use cases, describing how learned RNA features can be used in downstream analysis. The first use case demonstrates ntEmbd ' s utility in classifying coding and noncoding RNA benchmarked against existing tools, and the second one explores the utility of learned representations in identifying adapter sequences in nanopore RNA-seq reads. The tool as well as the trained models are freely available on GitHub at https://github.com/bcgsc/ntEmbd.

3.

Long-insert sequence capture detects high copy numbers in a defence-related beta-glucosidase gene ßglu-1 with large variations in white spruce but not Norway spruce.

Hung, Tin Hang; Wu, Ernest T Y; Zeltins, Pauls; Jansons, Aris; Ullah, Aziz; Erbilgin, Nadir; Bohlmann, Joerg; Bousquet, Jean; Birol, Inanc; Clegg, Sonya M; MacKay, John J.

BMC Genomics ; 25(1): 118, 2024 Jan 27.

Artigo em Inglês | MEDLINE | ID: mdl-38281030

RESUMO

Conifers are long-lived and slow-evolving, thus requiring effective defences against their fast-evolving insect natural enemies. The copy number variation (CNV) of two key acetophenone biosynthesis genes Ugt5/Ugt5b and ßglu-1 may provide a plausible mechanism underlying the constitutively variable defence in white spruce (Picea glauca) against its primary defoliator, spruce budworm. This study develops a long-insert sequence capture probe set (Picea_hung_p1.0) for quantifying copy number of ßglu-1-like, Ugt5-like genes and single-copy genes on 38 Norway spruce (Picea abies) and 40 P. glauca individuals from eight and nine provenances across Europe and North America respectively. We developed local assemblies (Piabi_c1.0 and Pigla_c.1.0), full-length transcriptomes (PIAB_v1 and PIGL_v1), and gene models to characterise the diversity of ßglu-1 and Ugt5 genes. We observed very large copy numbers of ßglu-1, with up to 381 copies in a single P. glauca individual. We observed among-provenance CNV of ßglu-1 in P. glauca but not P. abies. Ugt5b was predominantly single-copy in both species. This study generates critical hypotheses for testing the emergence and mechanism of extreme CNV, the dosage effect on phenotype, and the varying copy number of genes with the same pathway. We demonstrate new approaches to overcome experimental challenges in genomic research in conifer defences.

Assuntos

Picea , Humanos , Picea/genética , Picea/metabolismo , Variações do Número de Cópias de DNA , beta-Glucosidase/genética , Genômica , Transcriptoma

4.

Establishing association between HLA-C*04:01 and severe COVID-19.

Warren, René L; Abraham, Rohan; Calingo, Marc; Garant, Jean-Michel; Jones, Steven J M; Birol, Inanc.

HLA ; 103(1): e15355, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38273454

Assuntos

COVID-19 , Humanos , Antígenos HLA-C/genética , Alelos , SARS-CoV-2 , Frequência do Gene

5.

aaHash: recursive amino acid sequence hashing.

Wong, Johnathan; Kazemi, Parham; Coombe, Lauren; Warren, René L; Birol, Inanç.

Bioinform Adv ; 3(1): vbad162, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38023332

RESUMO

Motivation: K-mer hashing is a common operation in many foundational bioinformatics problems. However, generic string hashing algorithms are not optimized for this application. Strings in bioinformatics use specific alphabets, a trait leveraged for nucleic acid sequences in earlier work. We note that amino acid sequences, with complexities and context that cannot be captured by generic hashing algorithms, can also benefit from a domain-specific hashing algorithm. Such a hashing algorithm can accelerate and improve the sensitivity of bioinformatics applications developed for protein sequences. Results: Here, we present aaHash, a recursive hashing algorithm tailored for amino acid sequences. This algorithm utilizes multiple hash levels to represent biochemical similarities between amino acids. aaHash performs â¼10× faster than generic string hashing algorithms in hashing adjacent k-mers. Availability and implementation: aaHash is available online at https://github.com/bcgsc/btllib and is free for academic use.

6.

Genomic structures and regulation patterns at HPV integration sites in cervical cancer.

Porter, Vanessa L; O'Neill, Kieran; MacLennan, Signe; Corbett, Richard D; Ng, Michelle; Culibrk, Luka; Hamadeh, Zeid; Iden, Marissa; Schmidt, Rachel; Tsaih, Shirng-Wern; Chang, Glenn; Fan, Jeremy; Nip, Ka Ming; Akbari, Vahid; Chan, Simon K; Hopkins, James; Moore, Richard A; Chuah, Eric; Mungall, Karen L; Mungall, Andrew J; Birol, Inanc; Jones, Steven J M; Rader, Janet S; Marra, Marco A.

bioRxiv ; 2023 Nov 05.

Artigo em Inglês | MEDLINE | ID: mdl-37961641

RESUMO

Human papillomavirus (HPV) integration has been implicated in transforming HPV infection into cancer, but its genomic consequences have been difficult to study using short-read technologies. To resolve the dysregulation associated with HPV integration, we performed long-read sequencing on 63 cervical cancer genomes. We identified six categories of integration events based on HPV-human genomic structures. Of all HPV integrants, defined as two HPV-human breakpoints bridged by an HPV sequence, 24% contained variable copies of HPV between the breakpoints, a phenomenon we termed heterologous integration. Analysis of DNA methylation within and in proximity to the HPV genome at individual integration events revealed relationships between methylation status of the integrant and its orientation and structure. Dysregulation of the human epigenome and neighboring gene expression in cis with the HPV-integrated allele was observed over megabase-ranges of the genome. By elucidating the structural, epigenetic, and allele-specific impacts of HPV integration, we provide insight into the role of integrated HPV in cervical cancer.

7.

Assembly and annotation of the black spruce genome provide insights on spruce phylogeny and evolution of stress response.

Lo, Theodora; Coombe, Lauren; Gagalova, Kristina K; Marr, Alex; Warren, René L; Kirk, Heather; Pandoh, Pawan; Zhao, Yongjun; Moore, Richard A; Mungall, Andrew J; Ritland, Carol; Pavy, Nathalie; Jones, Steven J M; Bohlmann, Joerg; Bousquet, Jean; Birol, Inanç; Thomson, Ashley.

G3 (Bethesda) ; 14(1)2023 Dec 29.

Artigo em Inglês | MEDLINE | ID: mdl-37875130

RESUMO

Black spruce (Picea mariana [Mill.] B.S.P.) is a dominant conifer species in the North American boreal forest that plays important ecological and economic roles. Here, we present the first genome assembly of P. mariana with a reconstructed genome size of 18.3 Gbp and NG50 scaffold length of 36.0 kbp. A total of 66,332 protein-coding sequences were predicted in silico and annotated based on sequence homology. We analyzed the evolutionary relationships between P. mariana and 5 other spruces for which complete nuclear and organelle genome sequences were available. The phylogenetic tree estimated from mitochondrial genome sequences agrees with biogeography; specifically, P. mariana was strongly supported as a sister lineage to P. glauca and 3 other taxa found in western North America, followed by the European Picea abies. We obtained mixed topologies with weaker statistical support in phylogenetic trees estimated from nuclear and chloroplast genome sequences, indicative of ancient reticulate evolution affecting these 2 genomes. Clustering of protein-coding sequences from the 6 Picea taxa and 2 Pinus species resulted in 34,776 orthogroups, 560 of which appeared to be specific to P. mariana. Analysis of these specific orthogroups and dN/dS analysis of positive selection signatures for 497 single-copy orthogroups identified gene functions mostly related to plant development and stress response. The P. mariana genome assembly and annotation provides a valuable resource for forest genetics research and applications in this broadly distributed species, especially in relation to climate adaptation.

Assuntos

Picea , Filogenia , Picea/genética , América do Norte

8.

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification.

Pardo-Palacios, Francisco J; Wang, Dingjie; Reese, Fairlie; Diekhans, Mark; Carbonell-Sala, Sílvia; Williams, Brian; Loveland, Jane E; De María, Maite; Adams, Matthew S; Balderrama-Gutierrez, Gabriela; Behera, Amit K; Gonzalez, Jose M; Hunt, Toby; Lagarde, Julien; Liang, Cindy E; Li, Haoran; Jerryd Meade, Marcus; Moraga Amador, David A; Prjibelski, Andrey D; Birol, Inanc; Bostan, Hamed; Brooks, Ashley M; Hasan Çelik, Muhammed; Chen, Ying; Du, Mei R M; Felton, Colette; Göke, Jonathan; Hafezqorani, Saber; Herwig, Ralf; Kawaji, Hideya; Lee, Joseph; Liang Li, Jian; Lienhard, Matthias; Mikheenko, Alla; Mulligan, Dennis; Ming Nip, Ka; Pertea, Mihaela; Ritchie, Matthew E; Sim, Andre D; Tang, Alison D; Kei Wan, Yuk; Wang, Changqing; Wong, Brandon Y; Yang, Chen; Barnes, If; Berry, Andrew; Capella, Salvador; Dhillon, Namrita; Fernandez-Gonzalez, Jose M; Ferrández-Peral, Luis.

bioRxiv ; 2023 Jul 27.

Artigo em Inglês | MEDLINE | ID: mdl-37546854

RESUMO

The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. The consortium generated over 427 million long-read sequences from cDNA and direct RNA datasets, encompassing human, mouse, and manatee species, using different protocols and sequencing platforms. These data were utilized by developers to address challenges in transcript isoform detection and quantification, as well as de novo transcript isoform identification. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. When aiming to detect rare and novel transcripts or when using reference-free approaches, incorporating additional orthogonal data and replicate samples are advised. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.

9.

Genomic virulence features of Beauveria bassiana as a biocontrol agent for the mountain pine beetle population.

Li, Janet X; Fernandez, Kleinberg X; Ritland, Carol; Jancsik, Sharon; Engelhardt, Daniel B; Coombe, Lauren; Warren, René L; van Belkum, Marco J; Carroll, Allan L; Vederas, John C; Bohlmann, Joerg; Birol, Inanc.

BMC Genomics ; 24(1): 390, 2023 Jul 10.

Artigo em Inglês | MEDLINE | ID: mdl-37430186

RESUMO

BACKGROUND: The mountain pine beetle, Dendroctonus ponderosae, is an irruptive bark beetle that causes extensive mortality to many pine species within the forests of western North America. Driven by climate change and wildfire suppression, a recent mountain pine beetle (MPB) outbreak has spread across more than 18 million hectares, including areas to the east of the Rocky Mountains that comprise populations and species of pines not previously affected. Despite its impacts, there are few tactics available to control MPB populations. Beauveria bassiana is an entomopathogenic fungus used as a biological agent in agriculture and forestry and has potential as a management tactic for the mountain pine beetle population. This work investigates the phenotypic and genomic variation between B. bassiana strains to identify optimal strains against a specific insect. RESULTS: Using comparative genome and transcriptome analyses of eight B. bassiana isolates, we have identified the genetic basis of virulence, which includes oosporein production. Genes unique to the more virulent strains included functions in biosynthesis of mycotoxins, membrane transporters, and transcription factors. Significant differential expression of genes related to virulence, transmembrane transport, and stress response was identified between the different strains, as well as up to nine-fold upregulation of genes involved in the biosynthesis of oosporein. Differential correlation analysis revealed transcription factors that may be involved in regulating oosporein production. CONCLUSION: This study provides a foundation for the selection and/or engineering of the most effective strain of B. bassiana for the biological control of mountain pine beetle and other insect pests populations.

Assuntos

Beauveria , Besouros , Animais , Beauveria/genética , Virulência/genética , Genômica

10.

aaHash: recursive amino acid sequence hashing.

Wong, Johnathan; Kazemi, Parham; Coombe, Lauren; Warren, René L; Birol, Inanç.

bioRxiv ; 2023 May 10.

Artigo em Inglês | MEDLINE | ID: mdl-37214907

RESUMO

Motivation: K-mer hashing is a common operation in many foundational bioinformatics problems. However, generic string hashing algorithms are not optimized for this application. Strings in bioinformatics use specific alphabets, a trait leveraged for nucleic acid sequences in earlier work. We note that amino acid sequences, with complexities and context that cannot be captured by generic hashing algorithms, can also benefit from a domain-specific hashing algorithm. Such a hashing algorithm can accelerate and improve the sensitivity of bioinformatics applications developed for protein sequences. Results: Here, we present aaHash, a recursive hashing algorithm tailored for amino acid sequences. This algorithm utilizes multiple hash levels to represent biochemical similarities between amino acids. aaHash performs ~10X faster than generic string hashing algorithms in hashing adjacent k-mers. Availability and implementation: aaHash is available online at https://github.com/bcgsc/btllib and is free for academic use.

11.

Linear time complexity de novo long read genome assembly with GoldRush.

Wong, Johnathan; Coombe, Lauren; Nikolic, Vladimir; Zhang, Emily; Nip, Ka Ming; Sidhu, Puneet; Warren, René L; Birol, Inanç.

Nat Commun ; 14(1): 2906, 2023 05 22.

Artigo em Inglês | MEDLINE | ID: mdl-37217507

RESUMO

Current state-of-the-art de novo long read genome assemblers follow the Overlap-Layout-Consensus paradigm. While read-to-read overlap - its most costly step - was improved in modern long read genome assemblers, these tools still often require excessive RAM when assembling a typical human dataset. Our work departs from this paradigm, foregoing all-vs-all sequence alignments in favor of a dynamic data structure implemented in GoldRush, a de novo long read genome assembly algorithm with linear time complexity. We tested GoldRush on Oxford Nanopore Technologies long sequencing read datasets with different base error profiles sourced from three human cell lines, rice, and tomato. Here, we show that GoldRush achieves assembly scaffold NGA50 lengths of 18.3-22.2, 0.3 and 2.6 Mbp, for the genomes of human, rice, and tomato, respectively, and assembles each genome within a day, using at most 54.5 GB of random-access memory, demonstrating the scalability of our genome assembly paradigm and its implementation.

Assuntos

Algoritmos , Genoma , Humanos , Análise de Sequência de DNA , Sequenciamento de Nucleotídeos em Larga Escala

12.

Reference-free assembly of long-read transcriptome sequencing data with RNA-Bloom2.

Nip, Ka Ming; Hafezqorani, Saber; Gagalova, Kristina K; Chiu, Readman; Yang, Chen; Warren, René L; Birol, Inanc.

Nat Commun ; 14(1): 2940, 2023 05 22.

Artigo em Inglês | MEDLINE | ID: mdl-37217540

RESUMO

Long-read sequencing technologies have improved significantly since their emergence. Their read lengths, potentially spanning entire transcripts, is advantageous for reconstructing transcriptomes. Existing long-read transcriptome assembly methods are primarily reference-based and to date, there is little focus on reference-free transcriptome assembly. We introduce "RNA-Bloom2 [ https://github.com/bcgsc/RNA-Bloom ]", a reference-free assembly method for long-read transcriptome sequencing data. Using simulated datasets and spike-in control data, we show that the transcriptome assembly quality of RNA-Bloom2 is competitive to those of reference-based methods. Furthermore, we find that RNA-Bloom2 requires 27.0 to 80.6% of the peak memory and 3.6 to 10.8% of the total wall-clock runtime of a competing reference-free method. Finally, we showcase RNA-Bloom2 in assembling a transcriptome sample of Picea sitchensis (Sitka spruce). Since our method does not rely on a reference, it further sets the groundwork for large-scale comparative transcriptomics where high-quality draft genome assemblies are not readily available.

Assuntos

RNA , Transcriptoma , Transcriptoma/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos

13.

ntLink: A Toolkit for De Novo Genome Assembly Scaffolding and Mapping Using Long Reads.

Coombe, Lauren; Warren, René L; Wong, Johnathan; Nikolic, Vladimir; Birol, Inanc.

Curr Protoc ; 3(4): e733, 2023 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-37039735

RESUMO

With the increasing affordability and accessibility of genome sequencing data, de novo genome assembly is an important first step to a wide variety of downstream studies and analyses. Therefore, bioinformatics tools that enable the generation of high-quality genome assemblies in a computationally efficient manner are essential. Recent developments in long-read sequencing technologies have greatly benefited genome assembly work, including scaffolding, by providing long-range evidence that can aid in resolving the challenging repetitive regions of complex genomes. ntLink is a flexible and resource-efficient genome scaffolding tool that utilizes long-read sequencing data to improve upon draft genome assemblies built from any sequencing technologies, including the same long reads. Instead of using read alignments to identify candidate joins, ntLink utilizes minimizer-based mappings to infer how input sequences should be ordered and oriented into scaffolds. Recent improvements to ntLink have added important features such as overlap detection, gap-filling, and in-code scaffolding iterations. Here, we present three basic protocols demonstrating how to use each of these new features to yield highly contiguous genome assemblies, while still maintaining ntLink's proven computational efficiency. Further, as we illustrate in the alternate protocols, the lightweight minimizer-based mappings that enable ntLink scaffolding can also be utilized for other downstream applications, such as misassembly detection. With its modularity and multiple modes of execution, ntLink has broad benefit to the genomics community, from genome scaffolding and beyond. ntLink is an open-source project and is freely available from https://github.com/bcgsc/ntLink. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: ntLink scaffolding using overlap detection Basic Protocol 2: ntLink scaffolding with gap-filling Basic Protocol 3: Running in-code iterations of ntLink scaffolding Alternate Protocol 1: Generating long-read to contig mappings with ntLink Alternate Protocol 2: Using ntLink mappings for genome assembly correction with Tigmint-long Support Protocol: Installing ntLink.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genômica/métodos , Análise de Sequência de DNA/métodos , Genoma

14.

Characterization and simulation of metagenomic nanopore sequencing data with Meta-NanoSim.

Yang, Chen; Lo, Theodora; Nip, Ka Ming; Hafezqorani, Saber; Warren, René L; Birol, Inanc.

Gigascience ; 122023 03 20.

Artigo em Inglês | MEDLINE | ID: mdl-36939007

RESUMO

BACKGROUND: Nanopore sequencing is crucial to metagenomic studies as its kilobase-long reads can contribute to resolving genomic structural differences among microbes. However, sequencing platform-specific challenges, including high base-call error rate, nonuniform read lengths, and the presence of chimeric artifacts, necessitate specifically designed analytical algorithms. The use of simulated datasets with characteristics that are true to the sequencing platform under evaluation is a cost-effective way to assess the performance of bioinformatics tools with the ground truth in a controlled environment. RESULTS: Here, we present Meta-NanoSim, a fast and versatile utility that characterizes and simulates the unique properties of nanopore metagenomic reads. It improves upon state-of-the-art methods on microbial abundance estimation through a base-level quantification algorithm. Meta-NanoSim can simulate complex microbial communities composed of both linear and circular genomes and can stream reference genomes from online servers directly. Simulated datasets showed high congruence with experimental data in terms of read length, error profiles, and abundance levels. We demonstrate that Meta-NanoSim simulated data can facilitate the development of metagenomic algorithms and guide experimental design through a metagenome assembly benchmarking task. CONCLUSIONS: The Meta-NanoSim characterization module investigates read features, including chimeric information and abundance levels, while the simulation module simulates large and complex multisample microbial communities with different abundance profiles. All trained models and the software are freely accessible at GitHub: https://github.com/bcgsc/NanoSim.

Assuntos

Sequenciamento por Nanoporos , Nanoporos , Metagenoma , Sequenciamento por Nanoporos/métodos , Análise de Sequência de DNA/métodos , Simulação por Computador , Metagenômica/métodos , Software , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos

15.

Models and data of AMPlify: a deep learning tool for antimicrobial peptide prediction.

Li, Chenkai; Warren, René L; Birol, Inanc.

BMC Res Notes ; 16(1): 11, 2023 Feb 02.

Artigo em Inglês | MEDLINE | ID: mdl-36732807

RESUMO

OBJECTIVES: Antibiotic resistance is a rising global threat to human health and is prompting researchers to seek effective alternatives to conventional antibiotics, which include antimicrobial peptides (AMPs). Recently, we have reported AMPlify, an attentive deep learning model for predicting AMPs in databases of peptide sequences. In our tests, AMPlify outperformed the state-of-the-art. We have illustrated its use on data describing the American bullfrog (Rana [Lithobates] catesbeiana) genome. Here we present the model files and training/test data sets we used in that study. The original model (the balanced model) was trained on a balanced set of AMP and non-AMP sequences curated from public databases. In this data note, we additionally provide a model trained on an imbalanced set, in which non-AMP sequences far outnumber AMP sequences. We note that the balanced and imbalanced models would serve different use cases, and both would serve the research community, facilitating the discovery and development of novel AMPs. DATA DESCRIPTION: This data note provides two sets of models, as well as two AMP and four non-AMP sequence sets for training and testing the balanced and imbalanced models. Each model set includes five single sub-models that form an ensemble model. The first model set corresponds to the original model trained on a balanced training set that has been described in the original AMPlify manuscript, while the second model set was trained on an imbalanced training set.

Assuntos

Peptídeos Antimicrobianos , Aprendizado Profundo , Animais , Sequência de Aminoácidos , Antibacterianos , Rana catesbeiana/genética

16.

Associating Biological Activity and Predicted Structure of Antimicrobial Peptides from Amphibians and Insects.

Richter, Amelia; Sutherland, Darcy; Ebrahimikondori, Hossein; Babcock, Alana; Louie, Nathan; Li, Chenkai; Coombe, Lauren; Lin, Diana; Warren, René L; Yanai, Anat; Kotkoff, Monica; Helbing, Caren C; Hof, Fraser; Hoang, Linda M N; Birol, Inanc.

Antibiotics (Basel) ; 11(12)2022 Nov 27.

Artigo em Inglês | MEDLINE | ID: mdl-36551368

RESUMO

Antimicrobial peptides (AMPs) are a diverse class of short, often cationic biological molecules that present promising opportunities in the development of new therapeutics to combat antimicrobial resistance. Newly developed in silico methods offer the ability to rapidly discover numerous novel AMPs with a variety of physiochemical properties. Herein, using the rAMPage AMP discovery pipeline, we bioinformatically identified 51 AMP candidates from amphibia and insect RNA-seq data and present their in-depth characterization. The studied AMPs demonstrate activity against a panel of bacterial pathogens and have undetected or low toxicity to red blood cells and human cultured cells. Amino acid sequence analysis revealed that 30 of these bioactive peptides belong to either the Brevinin-1, Brevinin-2, Nigrocin-2, or Apidaecin AMP families. Prediction of three-dimensional structures using ColabFold indicated an association between peptides predicted to adopt a helical structure and broad-spectrum antibacterial activity against the Gram-negative and Gram-positive species tested in our panel. These findings highlight the utility of associating the diverse sequences of novel AMPs with their estimated peptide structures in categorizing AMPs and predicting their antimicrobial activity.

17.

The western redcedar genome reveals low genetic diversity in a self-compatible conifer.

Shalev, Tal J; Gamal El-Dien, Omnia; Yuen, Macaire M S; Shengqiang, Shu; Jackman, Shaun D; Warren, René L; Coombe, Lauren; van der Merwe, Lise; Stewart, Ada; Boston, Lori B; Plott, Christopher; Jenkins, Jerry; He, Guifen; Yan, Juying; Yan, Mi; Guo, Jie; Breinholt, Jesse W; Neves, Leandro G; Grimwood, Jane; Rieseberg, Loren H; Schmutz, Jeremy; Birol, Inanc; Kirst, Matias; Yanchuk, Alvin D; Ritland, Carol; Russell, John H; Bohlmann, Joerg.

Genome Res ; 32(10): 1952-1964, 2022 10.

Artigo em Inglês | MEDLINE | ID: mdl-36109148

RESUMO

We assembled the 9.8-Gbp genome of western redcedar (WRC; Thuja plicata), an ecologically and economically important conifer species of the Cupressaceae. The genome assembly, derived from a uniquely inbred tree produced through five generations of self-fertilization (selfing), was determined to be 86% complete by BUSCO analysis, one of the most complete genome assemblies for a conifer. Population genomic analysis revealed WRC to be one of the most genetically depauperate wild plant species, with an effective population size of approximately 300 and no significant genetic differentiation across its geographic range. Nucleotide diversity, π, is low for a continuous tree species, with many loci showing zero diversity, and the ratio of π at zero- to fourfold degenerate sites is relatively high (approximately 0.33), suggestive of weak purifying selection. Using an array of genetic lines derived from up to five generations of selfing, we explored the relationship between genetic diversity and mating system. Although overall heterozygosity was found to decline faster than expected during selfing, heterozygosity persisted at many loci, and nearly 100 loci were found to deviate from expectations of genetic drift, suggestive of associative overdominance. Nonreference alleles at such loci often harbor deleterious mutations and are rare in natural populations, implying that balanced polymorphisms are maintained by linkage to dominant beneficial alleles. This may account for how WRC remains responsive to natural and artificial selection, despite low genetic diversity.

Assuntos

Traqueófitas , Traqueófitas/genética , Autofertilização/genética , Alelos , Heterozigoto , Polimorfismo Genético , Variação Genética , Seleção Genética

18.

ntHash2: recursive spaced seed hashing for nucleotide sequences.

Kazemi, Parham; Wong, Johnathan; Nikolic, Vladimir; Mohamadi, Hamid; Warren, René L; Birol, Inanç.

Bioinformatics ; 38(20): 4812-4813, 2022 10 14.

Artigo em Inglês | MEDLINE | ID: mdl-36000872

RESUMO

MOTIVATION: Spaced seeds are robust alternatives to k-mers in analyzing nucleotide sequences with high base mismatch rates. Hashing is also crucial for efficiently storing abundant sequence data. Here, we introduce ntHash2, a fast algorithm for spaced seed hashing that can be integrated into various bioinformatics tools for efficient sequence analysis with applications in genome research. RESULTS: ntHash2 is up to 2.1× faster at hashing various spaced seeds than the previous version and 3.8× faster than conventional hashing algorithms with naïve adaptation. Additionally, we reduced the collision rate of ntHash for longer k-mer lengths and improved the uniformity of the hash distribution by modifying the canonical hashing mechanism. AVAILABILITY AND IMPLEMENTATION: ntHash2 is freely available online at github.com/bcgsc/ntHash under an MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Software , Sequência de Bases , Sementes , Análise de Sequência de DNA

19.

Mining Amphibian and Insect Transcriptomes for Antimicrobial Peptide Sequences with rAMPage.

Lin, Diana; Sutherland, Darcy; Aninta, Sambina Islam; Louie, Nathan; Nip, Ka Ming; Li, Chenkai; Yanai, Anat; Coombe, Lauren; Warren, René L; Helbing, Caren C; Hoang, Linda M N; Birol, Inanc.

Antibiotics (Basel) ; 11(7)2022 Jul 15.

Artigo em Inglês | MEDLINE | ID: mdl-35884206

RESUMO

Antibiotic resistance is a global health crisis increasing in prevalence every day. To combat this crisis, alternative antimicrobial therapeutics are urgently needed. Antimicrobial peptides (AMPs), a family of short defense proteins, are produced naturally by all organisms and hold great potential as effective alternatives to small molecule antibiotics. Here, we present rAMPage, a scalable bioinformatics discovery platform for identifying AMP sequences from RNA sequencing (RNA-seq) datasets. In our study, we demonstrate the utility and scalability of rAMPage, running it on 84 publicly available RNA-seq datasets from 75 amphibian and insect species-species known to have rich AMP repertoires. Across these datasets, we identified 1137 putative AMPs, 1024 of which were deemed novel by a homology search in cataloged AMPs in public databases. We selected 21 peptide sequences from this set for antimicrobial susceptibility testing against Escherichia coli and Staphylococcus aureus and observed that seven of them have high antimicrobial activity. Our study illustrates how in silico methods such as rAMPage can enable the fast and efficient discovery of novel antimicrobial peptides as an effective first step in the strenuous process of antimicrobial drug development.

20.

Spruce giga-genomes: structurally similar yet distinctive with differentially expanding gene families and rapidly evolving genes.

Gagalova, Kristina K; Warren, René L; Coombe, Lauren; Wong, Johnathan; Nip, Ka Ming; Yuen, Macaire Man Saint; Whitehill, Justin G A; Celedon, Jose M; Ritland, Carol; Taylor, Greg A; Cheng, Dean; Plettner, Patrick; Hammond, S Austin; Mohamadi, Hamid; Zhao, Yongjun; Moore, Richard A; Mungall, Andrew J; Boyle, Brian; Laroche, Jérôme; Cottrell, Joan; Mackay, John J; Lamothe, Manuel; Gérardi, Sébastien; Isabel, Nathalie; Pavy, Nathalie; Jones, Steven J M; Bohlmann, Joerg; Bousquet, Jean; Birol, Inanc.

Plant J ; 111(5): 1469-1485, 2022 09.

Artigo em Inglês | MEDLINE | ID: mdl-35789009

RESUMO

Spruces (Picea spp.) are coniferous trees widespread in boreal and mountainous forests of the northern hemisphere, with large economic significance and enormous contributions to global carbon sequestration. Spruces harbor very large genomes with high repetitiveness, hampering their comparative analysis. Here, we present and compare the genomes of four different North American spruces: the genome assemblies for Engelmann spruce (Picea engelmannii) and Sitka spruce (Picea sitchensis) together with improved and more contiguous genome assemblies for white spruce (Picea glauca) and for a naturally occurring introgress of these three species known as interior spruce (P. engelmannii × glauca × sitchensis). The genomes were structurally similar, and a large part of scaffolds could be anchored to a genetic map. The composition of the interior spruce genome indicated asymmetric contributions from the three ancestral genomes. Phylogenetic analysis of the nuclear and organelle genomes revealed a topology indicative of ancient reticulation. Different patterns of expansion of gene families among genomes were observed and related with presumed diversifying ecological adaptations. We identified rapidly evolving genes that harbored high rates of non-synonymous polymorphisms relative to synonymous ones, indicative of positive selection and its hitchhiking effects. These gene sets were mostly distinct between the genomes of ecologically contrasted species, and signatures of convergent balancing selection were detected. Stress and stimulus response was identified as the most frequent function assigned to expanding gene families and rapidly evolving genes. These two aspects of genomic evolution were complementary in their contribution to divergent evolution of presumed adaptive nature. These more contiguous spruce giga-genome sequences should strengthen our understanding of conifer genome structure and evolution, as their comparison offers clues into the genetic basis of adaptation and ecology of conifers at the genomic level. They will also provide tools to better monitor natural genetic diversity and improve the management of conifer forests. The genomes of four closely related North American spruces indicate that their high similarity at the morphological level is paralleled by the high conservation of their physical genome structure. Yet, the evidence of divergent evolution is apparent in their rapidly evolving genomes, supported by differential expansion of key gene families and large sets of genes under positive selection, largely in relation to stimulus and environmental stress response.

Assuntos

Picea , Traqueófitas , Etiquetas de Sequências Expressas , Genoma de Planta/genética , Família Multigênica/genética , Filogenia , Picea/genética , Traqueófitas/genética

RESUMO

RESUMO

RESUMO

Assuntos

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA